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Abstract 

This paper presents a general class of dynamic stochastic optimization problems we refer to 
as Stochastic Depletion Problems. A number of challenging dynamic optimization problems of 
practical interest are stochastic depletion problems. Optimal solutions for such problems are 
difficult to obtain, both from a pragmatic computational perspective as also from a theoretical 
perspective. As such, simple heuristics are highly desirable. We isolate two simple properties 
that, if satisfied by a problem within this class, guarantee that a myopic policy incurs a perfor- 
mance loss of at most 50 % relative to the optimal adaptive control policy for that problem. We 
are able to verify that these two properties are satisfied for several interesting families of stochas- 
tic depletion problems and as a consequence identify efficient near-optimal control policies for 
a number of interesting dynamic stochastic optimization problems. 
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1 Introduction 



This paper presents a general class of dynamic stochastic optimization problems we refer to as 
Stochastic Depletion Problems. A number of challenging problems of practical interest are stochastic 
depletion problems. In fact, even special deterministic cases of some of these problems have been 
the focus of a great deal of recent research. Optimal solutions for such problems are difficult to 
obtain, both from a pragmatic computational perspective as also from a theoretical perspective. As 
such, simple heuristics are highly desirable. In this paper, we identify two simple properties that if 
satisfied by a stochastic depletion problem guarantee that a myopic policy has expected value within 
a factor of 2 of the optimal adaptive control policy for that problem. Such a myopic policy simply 
attempts to maximize one-period rewards (a notion we will formalize) and ignores information about 
the future evolution of the system making it practically attractive for many applications. We are 
able to verify that these two properties are satisfied for several interesting families of stochastic 
depletion problems and as a consequence identify efficient near-optimal control policies for a number 
of interesting dynamic stochastic optimization problems which we will highlight shortly. 

Informally, a stochastic depletion problem is specified by item types and activity sets. The use 
of an activity results in the depletion of items of various types. In particular, the number of items of 
a particular type depleted at any point in time are randomly distributed according to a distribution 
specific to the activity employed at that time and the number of items of that type available. The 
sufficient statistics of these distributions are themselves specified by exogenous stochastic processes. 
New items of a given type may appear and existing items depart according to exogenous stochastic 
processes. Item depletion generates rewards, and activities must be selected adaptively over time to 
accomplish such depletion. An adaptive activity selection policy in this framework has knowledge 
of system dynamics and at any given time must select an activity. An optimal such policy generates 
maximum total expected reward. 

We identify two simple properties we refer to as Value Function Monotonicity (VFM) and the 
Immediate Rewards (IR) property, that if satisfied by a stochastic depletion problem guarantee 
that a myopic policy generates expected reward within a factor of 2 of the optimal adaptive policy 
for that problem; that is, the myopic policy is a 2- approximation algorithm for that problem. This 
policy is allowed to use all the information available up to the current point in time and maximizes 
expected reward earned over the following time step. Both properties are intuitive: the VFM 
property states that the optimal total expected reward (or value) accrued in the future starting 
from a particular state of the system is non-decreasing in the vector of available items at that state. 
In the other direction, the IR property states that the additional value gained by making available 
additional items at a particular state of the system is at most the reward earned for the depletion 
of those items. 

We are able to verify both the VFM and IR properties for large families of stochastic depletion 
problems. These include stochastic depletion problems for which the total reward for items depleted 
over time is given by a non-decreasing submodular function of the vector of items depleted. An 
available item of a given type is depleted independent of all other items in the system with a 
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probability that depends on time, the item type and the activity employed. We refer to these 
as Submodular Stochastic Depletion Problems. We are also able to address families of problems 
where the reward earned for depleting an item is a non-increasing function of the time of depletion, 
specific to that item's type. We refer to these as Decaying Linear Stochastic Depletion Problems. 
Our performance analysis of the myopic heuristic is sharp for both these families of stochastic 
depletion problems. 

Our systematic study of the general class of problems presented in this paper results in a number 
of contributions which we now outline: 

Stochastic Control Problems: Our framework lets us easily recognize and analyze simple near- 
optimal policies for several high-dimensional stochastic control problems for which finding optimal 
control policies is otherwise difficult. 

For instance, control problems pertaining to several interesting discrete-time queueing models 
with general arrival processes and geometric service times may easily be reduced to stochastic 
depletion problems provided one allows for service disciplines with pre-emption. As an example, 
we consider a discrete time equivalent of a well studied 'call-center' queueing mod e l that has been an 
i mport ant subject of recent research (see for instance lHarrison and Zeevil (|2005l ). 
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binding optimal service policies for such problems is typically challenging. We identify 
a simple myopic policy that is near-optimal for a broad interesting class of perform ance metrics. 
Our p olicy is analogous to the so-called 'c — /i' scheduling rules (see for instance, Ivan Mieghem 

As another example we consider problems of dynamic 'product line design'. These are prob- 
lems where a firm must dynamically adjust the assortment of products it offers for sale so as to 
maximize expected revenues. Sales of a given product are influenced by the entire assortment of 
products offered as well as the prevailing market sizes for various customer segments. This rep- 
res ents an important generalizatio n of static product line design problems of the type considered 
by Ivan Rvzin and Mahaianl (|l999l ). Our analysis yields the surprising conclusion that the optimal 



policy is well approximated by solving a sequence of static product line design problems. 
Online algorithms for stochastic variants of well studied deterministic problems: We 

show that a myopic policy earns expected rewards that are within a small constant factor of the 
optimal adaptive policy for what we believe to be important online stochastic generalizations of 
a number of problems studied typically in deterministic settings. The stochastic generalizations 
we present go beyond what may be modeled in the traditional online versions of these problems 
and incorporate features we view as highly desirable from a modeling perspective. We present 
approximation guarantees for these generalizations that are typically no worse (and sometimes, 
better) than the best known guarantees for their deterministic counterparts. 

For instance, we are able to provide an efficient myopic policy that is a 2-approximation algo- 
rithm for a stochastic broadcast scheduling problem. Successful data transmission in our broadcast 
scheduling model is stochastic which makes it naturally applicable in several communications engi- 
neering contexts. With a bound on the maximal number of simultaneous broadcasts the best known 
scheme for the deterministic broadcast scheduling problem (where page transmission is successful 
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with probability 1) is an offline 4- approximation (|Bar-Nov et al.l (|2002l )). With no constraints on 



ministic broadcast scheduling problem is a 4/3-approximation due to 
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) and the 


ition due to 
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(J200J). Our contribution to this body of work is to demonstrate that even with stochasticity in 
data transmission one can still achieve good constant factor performance guarantees via an adaptive 
myopic algorithm. 

As another example, we consider a stochastic generalization to the AdWords Assignment prob- 



lem ([Fleischer et al.l (|2006l )) where revenues are generated via stochastic clicks on placed ads as 
opposed to simply via the placement of an ad as in the deterministic version of the problem. Using 
an appropriate approximation algorithm as a subroutine for the myopic problem we demonstrate 
an efficient 3 + e myopic scheme which matches the best known 3 + e -approximation algori t hm fo r 



the deterministic version of the AdWords Assignment problem due to lGoundan and Schula (|2007l ). 
A unified framework: We provide an elegant, unified framework for the design and analysis 
of stochastic optimization problems that is analogous to that for submodular maximization over 
simple matroids. In particular, submodular maximization problems over several simple matroids 
(such as the cardinality and partition matroid) can be captured as submodular stochastic depletion 
problems. A number of interesting problems (such as the AdWor ds Assignment problem) are 



known to be examples of such submodular optimization problems (see lGoundan and Schuld ((2007) 
for example). The stochastic depletion framework provides a natural vehicle for the analysis of 
stochastic variants of such problems wherein the notion of selecting a set element translates to an 
attempt at selection; the success of such an attempt is specified by an exogenous stochastic process. 

We believe that the characterization of dynamic stochastic optimization problems that admit 
simple near-optimal control policies provided in the present work is likely to allow for a simple 
analysis of many problems beyond the handful of examples we have alluded to above. In particular, 
the two abstract properties that guarantee the effectiveness of a myopic policy are typically not hard 
to recognize and could potentially be established for families of problems outside those discussed 
here. The remainder of this paper is organized as follows: In Section [2] we formally specify the class 
of stochastic depletion problems. Section [3] presents a myopic heuristic for stochastic depletion 
problems and identifies two simple properties - the VFM and IR properties - that if satisfied by 
a stochastic depletion problem guarantee that a myopic policy is a 2-approximation algorithm for 
that problem. Section 0] verifies the VFM and IR properties for two general families of stochastic 
depletion problems - Submodular Stochastic Depletion problems and Linear Decaying Stochastic 
Depletion problems while the following two sections discuss a number of applications that lie within 
these families. Section [7] concludes with a perspective on interesting directions for further work. 

2 Model 

We are given a collection of items, each of which belongs to one of M types indexed by m. There 
can be at most x m items of type m available at any time. Items are depleted via the execution 
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of a suitable activity from a set of feasible activities A, and depletion of a set of items garners 
a non-negative reward we will formalize shortly. Time is discrete (indexed by t € [0, T]) and in 
each time step one must choose to employ some activity from A; we let T denote the length of 
the time horizon. We let i index the elements of A and denote a general element of A by A. Let 
xt t m denote the number of items of type m that remain at the start of the ith time-step. Assuming 
one chooses action A £ A in the tth time-step, the number of items of each type m depleted 
within that time-step is given by a x^ =1 {0, . . . , x^ m }-valued random vector, X^, whose sufficient 
statistics at time t are specified by an exogenous stochastic process {Pt(A}} specified for all A and 
taking values in some compact set 1Z. is assumed independent of the past given Pt(A) and 
the vector of non-depleted items xt- For example, we may have that {Pt(A}} is a [0, l] M -valued 
stochastic process and assuming one chooses action A at time t, X^ m is an independent Binomial- 
{xt,m,Pt,m(A)) random variable. In what follows we drop the superscript A from Xf for economy, 
as the dependence on A will be clear from context. We have xt+i, m = xt t m — X t>m for all m, and 
receive a total reward of g(xt,x t +i,t) where g : x x [0, T] — > M + satisfies: 

Assumption 1. For all x,x' € iJf , g(x,x',-) is a non-increasing, non-negative function. In 
addition, we assume g(x,x',T) = for all x,x' € 7J£ . 

Our objective is to design an adaptive scheduling policy that maximizes total expected reward 
earned within the first T time-steps. We define as our state- space the set 



S ={(x,t,p!,p 2 ,. . . ,p\ A \) : x € x m {0, 1,. . . ,x m },0 <t<T,pi£ K t+1 Vij. 

In particular, a state is associated with a vector of items remaining to be depleted, time and 
a history of the Pt processes. We denote by x(s) the projection of s onto its first co-ordinate and 
similarly employ the notation t(s), and Pi(s) for i = 1, 2, . . . , \A\. We let the random variable 
St € S denote state in the tth epoch. 

Finally, we define the random reward function R : S x A — > M + according to R(s, A) = 
g(x(s), x(s) — X t ( s j,t(s)), where X t ^ s -j is a random vector with sufficient statistic P t r s \(A). Now 
since at time t, the realization of Xt from taking a particular action is unknown, any control policy 
is a-priori unaware of the exact reward accrued from a particular action. Only the statistics of this 
reward are known. More specifically, a control policy tt is a mapping from S to the set of feasible 
activities A, and we denote by II the set of all such policies. Define the expected total reward-to-go 
under a policy tt starting at state s according to: 



J n (s) = E 



T-l 



R(S t 'MSt'))\St(s) 

t'=t(s) 



We let J*(s) = maxTrgn ■^ 7r ( s ) denote the expected total reward-to-go under the optimal policy tt* 
given by tt*(s) G argmax^gn J n (s). We will refer to the problem of finding such an optimal policy 
tt* as a Stochastic Depletion Problem. 
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We remark that our formulation permits modeling exogenous item arrivals and deadlines on the 
latest time of depletion for a given item. In particular, assuming without loss that x m = 1, that is, 
a given item type can have at most a single item (otherwise, we could simply refine the definition 
of a type), we associate with each type an arrival time r m and a deadline d m > r m . One may then 
assume that the Pt processes are such that Xt >m = a.s. for all A £ A if t £ [r m , d m ] in order to 
model the fact that item type m arrives at time r m and may not be depleted beyond time d m ; see 
Section 15.21 for a concrete illustrative example. Such a formulation succinctly assumes a (known) 
bound on the total number of arrivals in any given period. 

The optimal reward-to-go function (or value function) J* and the optimal scheduling policy ir* 
can in principle be computed via dynamic programming: In particular, letting S(s,A) denote the 
random next state encountered upon employing activity A in state s define the dynamic program- 
ming operator 7i according to: 

(1) (WJ)(s) = max E [R(s, A) + J(S(s, A))]. 

for all s £ S with t(s) < T — 1. J* may then be found as the solution to the Bellman equation 
TLJ = J, with the boundary condition J(s') = for all s' with t(s') = T. The optimal policy 
ir* may be found as the greedy maximizer with respect to J* in ([1]). Of course, this approach is 
computationally intractable: even in the event that the Pt processes are known a-priori, the state 
space (the set of all (x, t)) is exponentially large. As such, this makes solution of a general stochastic 
depletion problem pragmatically difficult. 

In addition to the above informal description for why we might expect finding an optimal 
solution to be a difficult task, one may easily see that special classes of stochastic depletion problems 
are NP-hard. We consider one such class here for completeness: Consider 'rational clairvoyant' 
stochastic depletion problems where the Pt sequences are rational valued deterministic sequences 
in [0, 1] M . Xt m is a Binomial- (xt m , Pt, m (-4)) random variable for all t,m,A, independent of the 
past and Xf m , for all m 1 ^ m,A' ^ A. We assume g(xt, xt+i, t) = Yl m w m(, x t,m ~ ^t+i,m) where 
Wm > for all m. The input to such a problem consists of |«4| rational- valued sequences of length 
T, and 2M rational numbers representing the initial number of jobs of each type and the reward 
constants, w m , for each type. One may then construct a polynomial time reduction from the 
set-cover decision problem (which is NP-complete) to the rational clairvoyant stochastic depletion 
problem. This is formalized in the following result: 

Theorem 1. The rational clairvoyant Stochastic Depletion problem is NP-hard. 

Proof: We reduce the Set-Cover decision problem to the rational clairvoyant Stochastic Depletion 
problem. An instance of the Set-Cover decision problem is specified by a ground set U, cover set 
B £ 2 U and an integer k (where k < \U\ without loss) and we must decide whether a cover (that 
is a subset of B whose union is a superset of U) of size < k exists. We reduce this question to the 
optimal solution of the following rational clairvoyant stochastic depletion instance: We consider 
a problem with \IA\ item- types, and assume we have a single item of each type. We let each set 
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in B correspond to a feasible activity in the sense that the use of that activity results in the 
depletion of all items in that set with probability 1 and the depletion of item outside that set 
with probability in any time slot. We let the depletion of a single item result in unit reward, 
w m = 1, and assume that the time horizon for scheduling is k. Assuming a polynomial time 
algorithm for rational clairvoyant stochastic depletion, the reduced problem would require time 
that is poly(|W|, \B\, k) = 0(poly(|W|, \B\)). If the optimal solution to this instance of the stochastic 
depletion problem has total reward \U\ we know that there exists a set cover of size < k. Conversely, 
if there exists a set-cover of size < k, then there exists a depletion policy with total reward \U\. 
Our reduction is thus many-one and polynomial in the size of the input. This completes the proof. 
□ 

We remark that the above reduction can also be used to reduce an interesting optimization 
problem related to set covering - namely that of maximum set coverage where one may pick at 
most k elements of B so as to cover as many elements of U as possible - to a stochastic depletion 
instance. In fact, this reduction is a special class of a useful set of reductions we will explore in 
Section 

We have, in this section, introduced a general class of dynamic stochastic optimization problems 
that as we shall see in later sections admit a number of interesting applications. Computing optimal 
solutions for such problems is evidently hard; the next section will present and analyze a natural, 
simple to implement heuristic for such problems. 

3 A Myopic Heuristic for Stochastic Depletion 

A natural heuristic policy one may consider for a stochastic depletion problem is given by the 
myopic policy which in state s chooses an activity set A that maximizes expected reward earned 
over the following time-step. That is 

7r 9 (s) € argmax^g^ E[R(s, A)]. 

Such a policy is adaptive but ignores the evolution of the system and the impact of the present 
choice of activity on rewards in future states. The set A in the myopic problem above is potentially 
exponentially large. In many cases however, this set has an implicit polynomial sized representation 
(for instance, A may correspond to a set of matchings) and the myopic maximization problem is 
efficiently solved. We will later also address the case where the myopic maximization problem is 
difficult but one has access to an appropriate near-optimal oracle. 

3.1 The Myopic Heuristic is an online 2- Approximation Algorithm: Sufficient 
Conditions 

Our objective in this section will be to identify stochastic depletion problems for which the myopic 
heuristic is near-optimal. In particular, we will identify stochastic depletion problems for which we 
will have for any state s G S, J*(s)/J nB (s) < 2. 
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Noting that the myopic heuristic does not utilize any information about the evolution of the 
Pt processes, we will simply assume that these are a-priori given sequences. In particular, we 
will compare the performance of the myopic heuristic to that of an optimal clairvoyant algorithm 
that knows the realizations of the Pt processes a-priori. Since an optimal clairvoyant policy must 
dominate the optimal policy, it will suffice to demonstrate performance guarantees relative to the 
optimal clairvoyant policy. Such an optimal clairvoyant policy may be computed over a reduced 
state-space: 



In the sequel, we will only consider such clairvoyant optimal policies; any reference to an optimal 
policy or value function in the sequel will pertain to an optimal policy or value function for the 
clairvoyant problem. Comparing performance to a clairvoyant policy yields performance guarantees 
that are valid over individual sample paths of the Pt processes. In particular, our guarantees will 
imply that the myopic heuristic is a 2- competitive online algorithm where the optimal scheme is 
allowed knowledge of entire sample-paths of the Pt processes but does not know the realization of 
X t until time t+1 (which is somewhat different from the typical competitive analysis setting) . 

We now identify two properties that if satisfied by the optimal clairvoyant value function J*, 
will imply our desired approximation guarantee. 

Property 1. Value Function Monotonicity: Consider states s, s' satisfying x(s) > x(s'),t(s) = 
t(s'). The VFM property requires that J*(s) > J*(s'). In words, all else being equal, it is advanta- 
geous to start at a state with a greater number of items available. 

Before we describe the second property we find it convenient to introduce some notation. For 
a G Z¥, define a mapping S a : S — > S according to S a (s) = s' with t(s') = t(s) and x(s') m = 
(x(s) m — a m ) + for all m. S a (s) is thus the state obtained if one were permitted to employ some set 
of activities (which presumably resulted in X t = a) but without incurring the use of a time-step. 

Property 2. Immediate Rewards: For all s € S and a € x m {0, 1, . . . ,x(s) m }, 



This property states that it is advantageous if one were able to deplete items without incurring 
the use of a time-step. In particular, if instead of starting at some state s € S, one started at state 
S a (s) and was in addition given reward for the depletion of a items, this property requires that 
the value of the second scenario be at least as large as the first. 

The two properties we have developed thus far for the optimal value function J* are what we 
fundamentally need to prove an approximation guarantee for the myopic heuristic. We now present 
the proof of our main approximation guarantee which assumes the VFM and IR properties. 

We first introduce relevant notation: For a € Z¥, define a mapping S a : S — > S that specifies 
the next state obtained if one employed an activity set in state s which resulted in Xt t m — <^m. / '\x(.s) m 
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for all m. In particular, we define S a according to S a (s) = s' with t(s f ) = t(s) + 1 and x(s') m = 
(x(s) m — a m ) + for all m. 

Theorem 2. Assuming J* satisfies Properties^ and\^ we have for all s € S, jwrk < 2. 

Proof: The proof proceeds by induction on the number of time steps that remain in the horizon, 
T — t(s). The claim is trivially true if t(s) = T — 1 since both the myopic and optimal policies 
coincide in this case. Consider a state s with t(s) < T — 1 and assume the claim true for all states 
s' with t(s') > t(s). 

Now if 7r*(s) = ir g (s) then the next states encountered in both systems are identically distributed 
so that the induction hypothesis immediately yields the result for state s. Consider the case where 
tt*(s) ytz 7r s (s). Denote by Xt(s) an< ^ ^t(s) ran dom vectors of depleted items in period t(s) under 
optimal and myopic policies respectively at state s. Let be an M dimensional vector. We have: 



J*(s\X* {s) ,X 9 tis) )=E[R(s,7r*(s))\X; {s 
< E[R(s, 7r*(s))|X t * s 
<E[R( S) n*( a ))\X; {B 
(2) <E[R(s,w*(s))\X: {8 
= E[R(s,Tr*( S ))\X* {s 
= E[R(s,n*(s))\X; (8 
<E[R(s,7T*(s))\X* t{s 



+ J*(S (s)) 

+ g(x(s),x(s) - X° (s) ,t(s) + 1) + J%S x *JS (s))) 
+ g(x(s),x( S )-X 9 t{sV t(s)) + J%S X BjS ( S ))) 
+ E[R(s,^( S ))\Xy + J*(S x? JS (s))) 
+ E[R(s,^( S ))\Xy + r(S x? Js)) 
+ E[R( S ,ir 9 (s))\X? (s) } + 2J* B (§ x ,j8)) 



where the first inequality follows from the assumed VFM property for J* upon noting that 
x(Sx* As)) < x(So(s)). The second inequality follows from the IR property assumed of J* upon 
taking s = So(s) and a = ^t(sY third inequality follows from Assumption Q] since g was 

assumed non-increasing in time. The third equality follows from the identity S x s (So(s)) = 
S X 9 (s) which in turn is simply a consequence of the definitions of S a and S a . The final inequality 

t(s) 

follows from the induction hypothesis. 
Now, 

r 9 ( s ) = e\r( s ,^(s)) + r 9 (s x? (s)) 

and E[R(s , TT 9 (s))] > E[R(s, tt*(s))] by the definition of the myopic policy tt 9 so that taking expec- 
tations in Ol), we have: 



J*(s) = E[J*(s\X; {s) ,X9 (s) 

IT 

< 2r\ s ) 



< E[R( S ,7r*(s))}+E[R(s,7r 9 (s))}+2E[r 3 (S K (a))] 



This concludes the proof. 



□ 



8 



3.1.1 Performance with an approximate Myopic Oracle: 

We will subsequently encounter a number of examples for which the set A is exponentially large, 
but admits some implicit polynomial representation allowing for efficient solutions to the myopic 
problem 

max.E\R(s,A)]. 

A<=A 

Sometimes, however, this problem may itself be difficult to solve. In such scenarios the use of an 
oracle that is an a-approximation to this subproblem is in fact a 1 + a-approximation to the original 
stochastic depletion problem. In particular, assume 7r a PP rox : S — > A satisfies 

Mi?(s,7r approx (s))] > — max E[R(s, A)]. 

a AeA 

for all s £ <S. One may then establish the following result whose proof is omitted but entirely 
analogous to Theorem [2] above: 

Theorem 3. Assuming J* satisfies Properties^ and{^ we have for all s € S, j^pjiwi ^ < 1 + a. 

4 Families Satisfying the VFM and IR Properties 

The previous section identified two abstract properties - namely, the VFM and IR properties, that 
if satisfied yield uniform performance loss guarantees for the myopic heuristic, via Theorems [2] 
and [3l These properties are in general difficult to check. We establish in this section two simple 
yet fairly general families of stochastic depletion problems that satisfy Properties Q] and [2] thereby 
guaranteeing that the myopic heuristic is a 2-approximation algorithm for those families. Although, 
there may certainly be other families of problems satisfying the VFM and IR properties, the families 
we identify in this section accommodate a number of interesting applications which will be the focus 
of Sections [5] and 

4.1 Submodular Stochastic Depletion Problems 

We consider problems for which {Pt(A)} is a [0, l] A/ -valued stochastic process for all A £ A. 
Assuming one chooses action A at time t, is a Binomial- (xt jm , Pt )Tn (A)) random variable that 
given xt,m and Pt !Tn (A), is independent of the past and X t .m> for m' ^ m. We assume submodular 
rewards. In particular, we assume g(xt, xt+i, t) = w(x — x t +i) — w(x — x t ), where w : — > R 
satisfies: 

Assumption 2. w : TL^ — > R satisfies: 

1. (Monotonicity) w(y) > w(y') for y > y'. 

2. (Submodularity) For e € Z¥ , w(y + e) — w(y) < w(y' + e) — w(y') if y > y' . 
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Such a class of functions clearly satisfies Assumption [TJ We need to demonstrate the VFM and 
IR properties. Recall that we will consider a clairvoyant optimal algorithm that knows a-priori the 
realizations of the sample paths of the Pt processes. We first demonstrate the IR property. It turns 
out that doing so requires only the monotonicity of w\ the sub- modularity of w is not required for 
this property to hold. 

Lemma 1. (Immediate Rewards) We have for submodular stochastic depletion problems, for all 
s£5 and a E x m {0, 1, . . . , x(s) m }, 

J*(s) < w (x — x(s) + a) — w (x — x(s)) + J* ( S a (s) J 



Proof: Consider using the optimal policy starting at state s, and let be the random state 
under this policy at the end of the time horizon (that is, at time T), so that: 

(3) J* (a) = E [w{x - x(St))] - w(x - x(s)) 

where the expectation is over the randomness in the system - namely, the random item depletion 
defined by the Pt sequences and chosen activities. Similarly, let be the random state under the 
optimal policy at the end of the time horizon upon starting in state S a (s) and as above, we note: 

J* \Sa(s)) = E w(x — x(S?p)) — w(x — x(s) + a) 

Let us re-consider the optimal policy starting at state s and in particular, let us partition the 
initial set of items into a set of fictitious and real items; we assume that we begin with a m fictitious 
items of type m and x(s) m — a m real items of type m. This partitioning serves purely as a labeling 
of items and does not impact the evolution of the system in any fashion. In particular, if at some 
point in time t, we have x{ m and x\ m fictitious and real items of type m respectively, then using 
activity set A results in the depletion of x( m and X^ m fictitious and real items respectively where 
X tm is a Binomial- [x t m , Pt, m (A)) random variable and X[ m is a Binomial-(x[ m , P tim {A)) random 
variable (so that x( m + X[ m = X t:rn ), and we are left with x{ m — x( m and x\ m — X r tm fictitious 
and real items respectively. Let Ylt -^tm (J2t -^Im) denote the number of fictitious (real) items of 
type m depleted at the end of the time horizon by the optimal policy starting in state s. 

We now make two critical observations: 

1. We observe that ^2 t x[ m < a m for all m by construction. 

2. Due to the fact that given s and a choice of activity, the depletion of a given item of type 
m at time t is independent of the past and the depletion of any other item in the system at 
that time, Ylt -^Tm ma y ^ e y i ewe d as the number of items of type m depleted under some 
induced randomized sub-optimal policy, say it', starting at state S a (s). This induced policy it' 
assumes in state S a (s) the existence of an additional a m items of each type m and simulates 
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depletion of those items without garnering any reward for them. It operates like the optimal 
policy would but on this modified state. More specifically, letting so = S a (s) and «o = a, we 
have: 

vr'(s ) = vr*((x(s ) + a ,t(s ))) 
Defining a\ = x(S((aa,t(s )),Tr'(s ))) and = S(s , n'(s )), 

vr'(si) =vr*((x(si)+Qi,t(si))) 

In general, defining a t = x(S((a t -i, t(s t -i)), ir'(s t -i))) and s t = S(s t -i,Tr'(s t -i)), 

n'(st) = w*((x(s t ) + a t , t(s t ))) 

It is worth noting that ao — at = Ylt'=t(s) -^t' wriUe x(so) — x(st) = Ylt'=t(s) -^t'- 
We consequently have: 



where the first inequality follows from the optimality of tt* among all non-anticipatory policies. The 
first equality follows from our definition of the policy tt' in Observation 2 and from the definition 
of S a (s) . The second inequality follows from the monotonicity of the function w and Observation 
1: ^2 t x( < a. The second equality is again by our construction of the x( and X\ processes. The 



While the IR property required only the monotonicity of w, the VFM property requires both 
the monotonicity of w as also its submodularity. This result is intuitive: a controller that starts 
at state s may simply assume that it starts at state s' and track state evolution accordingly. 
Assuming submodular rewards, applying the optimal policy to this (incorrectly tracked) state 
trajectory guarantees the policy a total expected reward of at least J*(s'), so the optimal policy 
must certainly do at least as well. The submodularity required is somewhat subtle, but it is simple 
to construct counterexamples in the absence of submodularity. We have: 

Lemma 2. (Value Function Monotonicity) We have for Submodular Stochastic Depletion problems, 
for all s,s' eS s.t. x{s) > x(s'),t(s) = t(s'), J* (a) > J*{s'). 

Proof: Consider a coupling of the systems starting at state s and s' wherein both systems witness 
identical sample paths for the item depletion processes defined by the Pt sequences. More precisely, 




= E [w(x — x(S?p))] — w {x — x(s) + a) 
= J*(s) + w(x — x(s)) — w (x — x(s) + a) 



final equality follows from ([3|). This completes the proof. 



□ 



11 



assuming that at time t, the systems are in states St and s' t respectively, then given Pt, the number 
of items depleted in both systems are coupled so that if x(st) > x(s' t ), and we employ activity set 
A in both systems, then, for all m, the number of successfully depleted items of type m in the St 
system, X tjm (~ Binomial (xt >m , Pt,m( -A))) an d the number of successfully depleted items of type 
m in the s' t system, X' tm (~ Binomial [x' t m , Pt )m (A))) satisfy Xt >m = X' tm + Y tjTn where Yt iTn is 
an independent Binomial-(x(sj) m — x(s' t ) m , Pt im (A)) random variable. A symmetric situation must 
hold if x(s' t ) > x(st). 

Now assume that the system starting at s' uses an optimal policy whereas the system starting 
at state s mimics the actions of the s' system (call this policy 7f). It is simple to see that 7f is an 
admissible non-anticipatory policy. 

Under our coupling, we have at t = t(s), that the number of items of type m depleted in the 
system starting at state s is greater than the number of items depleted in the system starting at 
state s'. That is, Xt tTn > X' tm . It then follows that, 

R(s, 7f (s)) = w{x — x(s) + X t ) — w(x — x(s)) 

> w(x — x(s) + X t ) — w(x — x(s)) 

> w(x — x(s') + X't) — w(x — x(s')) 

= R( S ',7T*(S')). 

That is, the reward earned in the system starting at state s is higher than that in the system starting 
at state s'; the first inequality above uses the monotonicity of w, the second inequality employs the 
submodularity of w. Now in addition, by our coupling, both systems transition to states St( s )+i 
and S' t , n +1 respectively satisfying x(S t ^ + i) = x(s) — X t = (x(s') — X' t ) + (x(s) — x(s') — Y t ) > 
x(s') — X[ = x(S' t ^ +1 ), so that we may repeat the above argument for time t(s) + 1. Continuing 
in this fashion we see that in every time step, the 7f controlled system starting at state s earns 
at least as large a reward as the it* controlled system starting in state s' . Taking expectations 
over the random item depletions (i.e. the Xt and X[ processes), we have J w (s) > J*(s'). Since 
J*(s) > J n (s), we are done. □ 
In light of Lemmas Q] and [21 Theorem [2] lets us conclude that the myopic heuristic is a 2- 
approximation algorithm for Submodular Stochastic Depletion problems. 



4.2 Linear Decaying Stochastic Depletion Problems 

We consider here a family of stochastic depletion problems closely related to the family just 
considered that also admit the VFM and IR properties and are particularly useful for many 
applications. As before, we consider problems for which {Pt(A)} is a [0, l] M -valued stochas- 
tic process for all A £ A. Assuming one chooses action A at time t, Xt m is a Binomial- 
{xt,mi Pt t m{A)) random variable that given and Pt im (A), is independent of the past and Xt jfn > 
for m! 7^ m. We assume linear rewards that are non-increasing in time. In particular, we assume 
g(x t ,x t+1 ,t) = J2 m w m,ti^t,m ~ x t+i,m) where w m ^t is a non-negative, non-increasing function of 
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t for all m (for the special case where u>t,m = w m > for all t this is merely a special case of a 
submodular stochastic depletion problem model we have considered). We can verify the immediate 
rewards property for such systems via a proof that closely follows Lemma [1] and that may be found 
in the appendix: 

Lemma 3. (Immediate Rewards) We have for Linear Decaying Stochastic Depletion problems, for 
all s G S and a € x m {0, 1, . . . , x(s) m }, 



In addition, we may verify the the VFM property. The proof of the following Lemma is essen- 
tially identical to that of Lemma and is omitted. 

Lemma 4. (Value Function Monotonicity) We have for Linear Decaying Stochastic Depletion 
problems, for all s,s' € S s.t. x(s) > x(s'),t(s) = t(s'), J*{s) > J*(s'). 

In light of Lemmas [3] and [U Theorem [2] lets us conclude that the myopic heuristic is a 2- 
approximation algorithm for Linear Decaying Stochastic Depletion problems. 

4.3 A Worst Case Example 

Having established the VFM and IR properties for the two families of stochastic depletion problems 
just discussed, we immediately have that the myopic policy has expected value within a factor of 
2 of the optimal policy for problems from either family. This analysis is sharp. In particular, we 
now present a problem instance that is in fact a member of both problem families and for which 
we have that the optimal policy has expected value that is a factor of 2 — e better than the myopic 
policy; e > can be made arbitrarily small. 

Example 1. (Myopic Sub-Optimality) Consider the case where M = 2 and T = 2, g(x,x',t) = 
[x% — x[) + (1 — e)(x2 — x' 2 ). Assume that x\ = x<i = 1 and that xo,i = £0,2 = 1- Let A = {1, 2}. 
The (deterministic) Pt processes are defined as: 



In words, the item of type m = 1 may be depleted in either time step via the use of A = 1, 
whereas the item of type m = 2 may be depleted only in the first time- step via A = 2. Only one 
of activity 1 or 2 may be employed within a given time-step. The myopic heuristic will first choose 
activity set {1} (which earns a reward of 1 via the depletion of the m = 1 type item) over activity 
set {2} (which earns a reward of 1 — e via the depletion of the m = 2 type job). Consequently, 
under the myopic heuristic, x\^i = 0, x\^ = 1 and the heuristic is unable to complete the one 
remaining job in the second time step, earning a total reward ofl. An optimal schedule would first 
choose activity set {2} (which earns a reward of 1 — e via the completion of the m = 2 type job). 




Til 



For A = 1 : P 0A (1) = l,Jb, 2 (l) 
For A = 2: P 0)1 (2) = 0, P , 2 (2) 



0, Pi,i(l) = l,Pi, 2 (l) =0. 

1, Pi,i(2) = 0,Pi, 2 (2) =0. 
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Consequently, under the optimal schedule, x\^ = l,x± t 2 = and the heuristic is able to complete 
the one remaining job in the second time step via the use of activity set {1} earning a total reward 
of 2 - e. We thus see that J*(s ) = (2 - e)J 7r9 (s ) here. 

5 Applications: Stochastic Control 

In the previous section, we presented two families of stochastic depletion problems for which the 
myopic heuristic is a 2-approximation algorithm. We now consider several problems of stochas- 
tic control that are easily seen to be members of these families. We thereby establish uniform 
performance guarantees for myopic policies for these stochastic control problems. 

5.1 Service Policies for Simple Queueing Models 

The following is a discrete time version of a 'call-center' queueing model that has received a good 
deal of recent attention: We have I buffers and J servers. Each buffer sees a general discrete time 
arrival process with the restriction that a given buffer can see at most a single arrival in a given 
time slot. For example, each buffer i, may see an independent Bernoulli(Aj) arrival process. A given 
server j may be used to service any single job in the system in a given time slot. In particular, 
should server j be used to service a job arriving to buffer i, the service time is assumed to be an 
independent geometric random variable with mean /ij j (possibly oo). We allow for pre-emption in 
our service discipline. Consider the following natural objective: completion of a job that arrives at 
buffer i earns a non-negative reward where d is the time that job has remained in the system 
(that is, the delay experienced by that job). We assume is non-incerasing in d. At every point in 
time one must decide on a matching between servers and available jobs with a view to maximizing 
the expected reward earned over T periods. 

It is not difficult to see that the above problem is an example of a Linear Decaying Stochastic 
Depletion Problem. In particular, we define an item type for every tuple (i, r) where i = 1,2, ... ,1 
and r = 0, 1, . . . ,T — 1. Thus an item type m is associated with an arrival buffer i m and an arrival 
time T m . We can have at most a single item of a given type, i.e. x m = 1. The set of feasible 
activities A is simply the set of all matching of servers to item types. Given a particular matching, 
the probability of depletion for a given item type (or job), is determined by the server matched to 
that job or if no server is matched to it. Of course, a job may not be depleted prior to arrival. 
In particular, we have for item type m = (i m , r m ) 



We define our reward function g according to g(x t ,xt+i,t) = Y^m w m,t(xt,m — ^t+i,m) where we 



was completed at time t, and is otherwise. Since both the VFM and IR properties hold for this 




assume w m ^ = r i m ,(t-T m )+ ■ I n particular, the reward generated in the tth time step is given by 
Yll=i Ylt'=o r i,t-t'-^i,t-t' where X i:t _ t > = 1 if a job arriving to buffer i at the start of time step t' 
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family of stochastic depletion problems, we have via Lemmas [3] and 01 and Theorem [2] that the 
myopic policy generates total expected rewards that are within a factor of 2 of the optimal policy. 
In fact, we have shown that this performance guarantee holds relative to an optimal policy that 
has full knowledge of the entire job-arrival process! 

A natural continuous time variant of the problem above has been the subject of much recent 
study and r esults on throughput opti mality in a certain heavy traffic regime are available (see, 
for instance, 



Bassamboo et al. 



(]2006bl )); the formulation we have discussed focuses on a different 
objective and complements that body of work. It is also interesting to note the ana l ogy o f our 
myopic policy with the so-called c — /i scheduling rules (see for instance, Ivan Miegheml (|1995l )) for 
scheduling jobs arriving to multiple buffers served by a single server with a view to minimizing 
total delay cost (every job incurs a buffer dependent, typically linear, delay cost). In our model, we 
maximize reward as opposed to minimizing cost; rewards decrease with delay but are necessarily 
non-negative. 



5.2 Stochastic Broadcast Scheduling 

We consider a broadcast communication system where a single data item may be simultaneously 
transmitted to multiple users. In particular, we consider the following problem: we have a set of 
U users (indexed by u) and a finite set of data items or 'pages', P = {1, . . . , n}, indexed by i. In 
every time slot r S {1,...,T — 1}, any given user may generate a request for some page (or pages) 
he has not requested in the past. We assume that every request for a page is associated with a 
deadline d € {1, . . . ,T — 1}. Should a request for page i by user u be successfully satisfied prior 
to its deadline, the transmitter earns non-negative reward rf. We assume that the arrival process 
governing requests from users, as also the deadlines associated with those requests are exogenous 
stochastic processes and further assume a (known) bound on the number of requests that may 
arrive in any given time slot. In each time slot, a single page can be transmitted (although in 
what follows we could as well consider allowing up to b pages). Due to the broadcast nature of 
the system, this transmitted page may be transmitted simultaneously to up to k > 1 users. The 
communication channel to users is stochastic, so that should a page be transmitted to a particular 
user u in time t, that user receives the page with some channel dependent probability P" which is 
itself an exogenous stochastic process. In each time slot, one must decide which page to transmit 
and to which k users in order to maximize the expected reward accrued over T time slots. 

Approximation algorithms for deterministic broadcast scheduling (where transmissions are suc- 
cessful with probability 1 so that P" = 1 for all t, u) have recei ved quite a bit o f atte ntion. The best 
known approximation algorithm is a 4-approximation due to lBar-Nov et al.l (|2002l ). Without any 
constraints on the number of requests which can be satisfied by a single broadc a st (i.e . k = oo), 
the best known algorithm is a 4/3- approximation algorithm due to G andhi et al.l (120021) . The best 



known online algorithm for the same is a 2-approximation due to lKima and Chwal (|2004l ). Special- 
izing to this deter ministic case, our myo pic online algorithm improves upon the offline results for 
'finite batching' in iBar-Nov et al.l (|2002l ) albeit for uniform item sizes. Modeling stochasticity in 
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communication channels to users is important since in real world systems, congestion and various 
physical phenomena cause significant uncertainty in the successful transmission of pages. Schedul- 
ing communications over stochastic channels is, o f course, the focus of a substantial body of wor k 
in commu n icatio ns engineering. See, for instance, lErvilmaz et al.l (|2005l ). ISu and Tassiulad (|1997l ). 
Ren et al.l (|2002l ) for models closely related to the broadcast scheduling model we have presented. 
Most of that body of work is either simulation driven or else focuses on coarser performance metrics 
(such as throughput optimality). 

The stochastic broadcast scheduling problem we have presented may be cast as a Linear De- 
caying Stochastic Depletion Problems. Every request is associated with four parameters (u,i,T,d) 
representing the user, page, time of request and request deadline respectively. We associate an 
item type with each such request. Thus an item type m is identified by a request by user u m for 
page i m , with an arrival time, r m , and deadline, d m . An activity A E A is simply an assignment 
of a given page to k users. Given a particular choice of activity A, the probability of depletion of 
a particular item type is simply given by the quality of the channel to the user corresponding to 
that type, P", provided that user u is served under activity A; else it is 0. Of course a request may 
not be satisfied prior to its arrival or following the expiration of its deadline, so that for item type 

— (^m) imi Tmi dm)' 

Pt,m{A) = l ( i m>t > Tm l(j mjUm ) gj 4P t m 

We define our reward function g according to g(xt,xt+i,t) = J2 m w m(xt,m — ^t+i,m) where we 
assume w m = r^™ 1 . This is a Linear Decaying Stochastic Depletion Problems and since both the 
VFM and IR properties hold for this family of stochastic depletion problems, we have via Theorem 
[2] that the myopic policy generates expected value that is within a factor of 2 of the optimal policy. 

It is interesting to consider a special case of the stochastic broadcast scheduling problem we 
have presented: In particular, assuming that all requests are known at time t = and further that 
all these requests have deadline T, the myopic policy is in fact optimal if the channel to each user 
is 'static', i.e., P" = C u , for all u. This is established via an interchange argument; the proof of 
the following Lemma may be found in the appendix. 



Lemma 5. If is constant (= C u ) for all u, then, J 7r9 (so) = J*(sq) for all sq € S. 

Lemma [5] allows one to interpret the myopic heuristic for the general stochastic broadcast 
scheduling problem as one that at every time-step t, makes the simplifying assumption that all 
channels are static with success probabilities given by P" and that no further arrivals will be ob- 
served. This is, in fact, a common eng i neerin g design principle fo r scheduling o v er dynamic channels . 
For instance, Tsibonis and Geogiadis ( 20051 ') . Dua and Bambos ( 2006a . 2007 ). Huang et al. ( 2005 ). 
Dua and BambosT i 2006b ) . Chou and Miao ( 2006 ) all derive optimal scheduling policies for prob- 



lems similar to the broadcast scheduling problem here under the assumption of a static channel 
and other simplifying assumptions. The hope is that in conjunction with frequent channel state 
re-estimation (that is, frequent re-estimation of channel success probabilities), the use of schedul- 
ing schemes so derived may prove to be a very effective heuristic. In addition to being simple to 
implement and typically fast in practice, such an approach is robust to errors in specifying channel 
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dynamics. Lemma [5] and Theorem [2] thus lend theoretical support to this popular design prin- 
ciple. In particular, one may simply design a scheduling scheme assuming a static channel; one 
then employs this scheme in tandem with repeated channel re-estimation. Put another way, simply 
accounting for channel state suffices to obtain levels of performance close to optimal. 



5.3 Dynamic Product Line Design 



Consider a firm that is capable of producing an array of related products that may potentially 
be sold to one or more customer segments, each distinguished by its willingness to pay for various 
product features. For a variety of reasons (manufacturing capacity and cost, marketing capabilities, 
etc.), the firm may be constrained in the number of different products it is capable of simultane- 
ously offering for sale. Further, external competition may impose limitations on the prices the firm 
can post for a given product. Faced with these restrictions, the firm must decide on a product line 
to offer with a view to maximizing revenues. This is the essence of product line 'design' problems 
that have been extensively considered in the operations rese arch and ma rketing literature. For 



instance, the classic third degree price discrimination model of lPigou ( 1978 ) form s the basis of de- 



sign principles that center on explicit market segmentation (see lFrank et al.l (|1972l )). Alternatively, 
assuming a model of customer preference for various product attributes, one may consider optimiz- 
ing the attributes of products offered for sale so as to maximize revenues; customers 'self-select' 
product types that are of greatest appeal in this cas e. A number of p roduct line design p r oblem s 
of this type have been c o nside re d in past literature; Moorthvl (|1984l ) , iKohli and Sukumarl (|1990l ) , 
van Rvzin and Mahaianl ()1999l ). iHopp and Xul (|2005l ) are a few examples. A common thread to 
this work, however, is their consideration of static models. In reality, demand shocks and demand 
seasonality make the optimal product line design problem an inherently dynamic one. For in- 
stance, consider the following example that illustrates the importance of accounting for seasonality 
in demand: 

Example 2. A firm may offer at most one of two products ('outdated' or 'new') for sale at any 
epoch (over two successive sales epochs) to two distinct consumer segments - 'bargain hunters' and 
'early adopters'. Bargain hunters will purchase only the outdated product for 1 dollar in the first 
period with probability 1 and will make no purchase in period 2. 'Early adopters' will purchase 
only the new product with probability 1 in either epoch for 1 + e dollars. Assume we begin with an 
equal number of consumers in both segments. It is clear that a product line selection strategy that 
accounts for seasonality (by delaying the introduction of the new product to the second period) will 
earn about twice the revenues earned by a myopic strategy over two sales epochs. 

Motivated by the consideration of issues such as the above, we consider the following dynamic 
product line design problem: a firm is capable of offering products from some set V and must at 
any point in time offer a subset of products A C V with \A\ < k. The firm's products are purchased 
by / consumer segments and we let xt i denote the size of the ith. segment in the ith sales epoch. 
Assuming that the product line offered at time t is A, any segment i consumer present in the market 
at that time t will purchase a product in A with known probability Pt^(A). Such a sale garners 
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the seller revenue pi and the consumer is lost to the system. We assume that the firm has modeled 
the dynamics of the Pt processes and wishes to maximize expected revenues over T sales epochs. 
It is worthwhile discussing some of the salient features of our model 

1. We model the fact that a customer from a given segment may satisfy his requirement by 
the purchase of one of several product types, and that the probability of him purchasing 
a particular product type is influenced by the entire array of product types offered. Such a 
model permits customer self-selection but under the restriction that all substitutes are offered 



at the same price; this is precisely the type of model considered by Ivan Rvzin and Mahaian 



(1999). Alternatively, one may view the model a s assum ing that the seller has a means of 



directly segmenting customers (as in I Frank et al.l (|1972l )) and allowing for segment specific 
prices. 

2. We allow for general models of demand seasonality. In particular, we make no assumptions 
on the dynamics of the Pt processes. Further, we explicitly model the impact of current sales 
on future demands ('market saturation'). 

3. We assume that the product line designer has available an estimate of market size within 
each consumer seg ment. Such an assumption is potentially valid in several industries; see 



Yunesetal 



(|2007l ) for a practical discussion of this issue. 



4. Prices for each consumer segment are fixed. In reality this may arise, for example, due to the 
need to align with prices offered by competitors. 

It is simple to cast the above model as a Linear Decaying Stochastic Depletion Problems. 
In particular we associate with each customer segment i an item type m. Our set of activities 
A = {A : A E V, \A\ < k}, and depletion probabilities for item type m are specified according to 
Pt,m(A) = Pt,i m (A). We define our reward function g according to g(x t , x t+ i, t) = ^m w m,t( x t,m ~ 
%t+i,m) where we assume w m ^ = Pi m - This is a Linear Decaying Stochastic Depletion Problems, so 
that Lemmas [3] and S] with Theorem [2] immediately tell us that a myopic policy generates expected 
revenues within a factor of 2 of the optimal policy. From a managerial perspective, this suggests a 
robust recipe for dealing with demand shocks and seasonality: at every opportunity for product line 
update, one simply solves a static product line design problem with suitably revised estimates of the 
relevant customer demand model and market sizes. As such, much of the existing methodology for 
product line design can be brought to bear on the problem without a significant cost to optimality. 



6 Applications: Stochastic variants of submodular maximization 
problems over Matroids 

In this section, we turn our attention to the use of the stochastic depletion framework as a useful 
stochastic analogue to submodular maximization problems over simple matroids such as the cardi- 
nality matroid and the partition matroid. A number of hard deterministic optimization problems 
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can frequently be reduced to problems of this nature and in doing so, finding good approxima- 
tion algorithms for these problems is reduced to the task of finding a good oracle for the my- 
opic sub-problem (fo r a nu mber of recently considered problems of this type, see for instance, 



Goundan and Schula (|2007l )). Our hope is to produce good approximation algorithms for useful 



stochastic variants of such problems. As an illustration, we will later con sider an import a nt sto chas- 
tic generalization of t he Ad Words Assignment problem considered by iFleischer et al.l (|2006l ) and 



Goundan and Schuld |2007|) 



Given a set E, let U = 2 E . A cardinality matroid is a subset of U of the type M = {F C E : 
\F\ < k}, where k is an integer. A partition matroid is a subset of U of the type Ai = {F C E : 

d 

\F P| Ei\ <ki Vi} where we assume E = (J ™ =0 i?i and that integers k% for i = 0, 1, . . . , n are given. 
Consider optimization problems of the form 

(4) ^J( A ) 

where / : 2 E — > M + is a non-decreasing, submodular function. A number of interesting combinato- 
rial optimization problems are reduced to such maximization problems where A4 is a cardinality or 
partition matroid. We begin with establishing how such deterministic optimization problems are 
captured within the stochastic depletion framework. 

A4 is a cardinality matroid: We reduce to a submodular stochastic depletion instance 
assuming A4 is a cardinality matroid: We are given M = \E\ item types and assume that we begin 
with a single item of each type; i.e. x m = xo,m = 1- Let A = {1, 2, . . . -\E\} where Pt,m{j) = 1 if 
m = j and otherwise. That is, we define an item type for each element of E and in every time 
step we are allowed to deplete at most one item. We select as our reward function g(xt, xt+i, t) = 
f(x — xt+i) — f(x — xt) and set T = k. Observe that the value of an optimal solution to this problem 
is precisely J*(x,0). With Lemmas Q] and [21 Theorem [2] then immediately yields: 

Corollary 1. The myopic heuristic is a 2- approximation algorithm for maximizing a non- decreasing 
submodular function f over a cardinality matroid. 

We re mark that this is a weak er result than the well known optimal approximation ratio of 



— — y due to 

e— 1 



Nemhauser et al 



(|1978l ) . The analysis of Theorem [2J applies to a far broader class of 
problems, and in light of Example [lj we can not expect a tighter guarantee for the greedy heuristic 
via that general line of analysis. 

A4 is a partition matroid: We reduce @ to a submodular stochastic depletion instance 
assuming M. is a partition matroid: We are given M = \E\ item types and set A = {1,2, . . . 
We index the elements of E by m which identifies a particular element of E with a particular 
item type and assume that the first \Eq\ elements correspond to the elements of Eq, the next 
elements to the elements of \E\\ and so forth. We set the time horizon T = Y2i k% and define n + 1 
partitions of this horizon according to Tj = Ef=n ^> Xw=o We assume Pt, m (j) = 1 iff m = j 

and t G {Tj : m S Tj}. We select as our reward function g(xt, xt+i, t) = f(x — xt+i) — f(x — Xt). In 
words, we define an item type for each element of E and identify each subset Ei with a partition 
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of time. At any point in time t € Tj, we are allowed to deplete at most one available item from the 
partition E{. Observe that the value of an optimal solution to this problem is precisely J*(x, 0). The 
myopic heuristic for th is stochastic deplet ion problem corresponds precisely to the 'local greedy' 
heuristic introduced by 



Fisher et al 



(jl978l ). and we re-capture their result, namely: 



Corollary 2. The local greedy heuristic is a 2- approximation for maximizing a non- decreasing 
submodular function f over a partition matroid. 

Both classes of problems alluded to above have natural stochastic generalizations. As a simple 
example, one may consider a stochastic generalization to the problem of submodular maximization 
over a cardinality matroid which we refer to as the 'stochastic selection problem'; as opposed to 
selecting at most k elements from E, one is allowed k attempts at selecting elements of E. If at 
the ith selection attempt one attempts to select element e € E, the attempt is successful with 
probability Pf where {Pf} is an arbitrary [0, 1] valued sequence specified for every e 6 E. We 
would like to find an adaptive item selection policy that maximizes the expected value of the set 
of successfully selected items. It is easy to see that the stochastic selection problem includes as 
special cases appropriate stochastic generalizations of problems such as the maximum coverage 
problem. The problem of adaptively selecting items so as to maximize the expected value of the 
set of successfully selected items is seen to be a submodular stochastic depletion problem using 
precisely the reduction for the cardinality matroid above and one immediately has the following 
result. 

Lemma 6. The myopic heuristic is a 2- approximation for the stochastic selection problem. 

As an aside we note that if in addition, one assumes that Pjr = C, a constant for all t and 
e, it is simple to demonstrate that the myopic heuristic is, in fact, an -^-j approxim ation. This 



may be demonstrated as a corollary to the original result of iNemhauser et al.l (|1978l ): one simply 
considers coupling the optimal and myopic schemes so that on each sample path, both schemes 
have an identical number of successful placements. 

We now consider i n some detail, a practi cally relevant stochastic generalization of the AdWords 
Assignment problem ([Fleischer et al.l (|2006l )). The deterministic pr oblem may be reduced to t he 
maximization of a submodular function over a partition matroid (see lGoundan and Schulzl (|20071 )). 
We reduce our stochastic generalization to a submodular stochastic depletion problem. 



6.1 Cost-per-Click AdWords Assignment 

Consider the following optimization problem faced by firms that serve ads on the internet. We are 
given a set of N advertisers (indexed by i) and K keywords (indexed by k). The ith. advertiser has 
a budget Sj(> 0) and submits to the firm a valuation & for every keyword k. In every one of 
T periods, a keyword from the set of K keywords arrives according to some exogenous stochastic 
process. We assume that at most C advertisers' ads can be assigned to the arriving keyword. We 
denote by k t the index of the keyword arriving at time t. Should an advertiser i be assigned to an 
arriving keyword, kt, at time t and if in addition his ad is clicked on, he pays the firm the minimum 
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of Vi t k t and his remaining budget at time t; this payment is subtracted from his available budget. 
If the ad is not clicked on then no payments are made. We assume that should advertiser i be 
assigned to keyword kt at time t, his ad is clicked on with probability P\' *. Letting Va denote 
the random payment thus made by an advertiser in the tth period, we are interested in devising 



an adaptive ad-to-keyword assignment scheme that maximizes E 
expected revenues earned by the firm. 



, that is, the 



The above problem was considered in a deterministic offline setting by Fleischer et al.l (|2006l ) 



where it was assumed that P^' k = 1 for all t,i,k and in addition the sequence {kt} of arriving 
keywords was specified a-priori; the variant we consider here is an important generalization to that 
model since in practice advertisers make payments only if their displayed ads are clicked on, which 
happens with some positive, but small, probability. In addition, our formulation also allows us to 
capture exogenous advertiser arrivals and departures from the system - in particular, we simply 
assume P\' k = for all times t prior to a customers arrival to the system and following his departure 
from the system. 

This problem is easily cast as a submodular stochastic depletion problem: In particular, we 
define an item type m for every advertiser-keyword-time triple (im,,k m ,t m ), and assume a single 
item of each type, i.e. xo,m = x m = 1 for all m. The set of feasible activities, A, is the set of all 
subsets of item types, such that each subset has cardinality at most b and contains at most one type 
specific to a given user i. The probability that an item of type m is depleted at time t assuming 
one selects activity A is given by, 

t,m \ J { (im >k m ) t m ,kt — k m } t 

Finally, the reward function, g(xt, xt+i, t) = w(x — xt+i) — w(x — xt) where w : 7L M — > R + is defined 
according to 



w\x) 



[ Bi A S V im,k m X m J 

i \ m:i m =i / 



and thus satisfies Assumption [2j We finally note that the myopic sub-problem is trivial; it corre- 
sponds to choosing the b highest expected revenue advertisers. With Lemmas [2] and [Q Theorem [2] 
thus yields: 

Corollary 3. The myopic heuristic is a 2- approximation to Cost-per- Click AdWords Assignment 

In our formulation, a feasible ads-to-keyword assignment was subject to a simple cardinality 
constraint: an arriving keyword could have at most C ads assigned to it. We cou ld instead con 
sider using other, more complex constraints: in particular, in the formulation of 



Fleischer et al 



(|2006l ). every ad is associated with a rectangle of a specific height and width, and every arriving 
keyword with an available rectangular display area; a feasible assignment of ads to keywords is de- 
termined by a feasible packing of ad rectangles with in the display recta n gle. U sing the max-weight 



rectangle packing 2 + e-approximation algorithm of iJansen and Zhang] ((2004) for the myopic sub 



problem yields via Theorem [21 a 3 + e-approximation guarantee for the myopic heuristic which 
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mat ches the best known approx imation guarantee available for the original deterministic problem 
(see iGoundan and Schula (|2007l )). 



7 Concluding Remarks 

We have in the present work introduced a general class of dynamic stochastic optimization problems 
- Stochastic Depletion problems. We believe this to be an interesting class of problems: in spite of 
being fairly general, stochastic depletion problems frequently admit a simple, near-optimal myopic 
control policy. This paper presented general conditions that guarantee the near optimality of a 
myopic control policy for a stochastic depletion problem and went on to verify these properties 
for broad families of stochastic depletion problems. This in turn yielded myopic approximation 
algorithms for a number of interesting dynamic optimization applications. 

There are several directions that deserve continued study. From an algorithmic perspective, one 
may consider £>step lookahead policies as a generalization of the myopic (1-step lookahead) policies 
analyzed here. Such policies select, at every point in time, an action that is optimal for a problem 
with a horizon precisely k time steps ahead. It would be interesting to understand whether, or 
under what conditions, such policies may be expected to dominate the myopic policy. 

In addition to the applications in Section and El it would be interesting to explore other dy- 
namic stochastic optimization problems that may be studied either within our f ramework or perhaps 



slight modifications to it. For instance, the generalized assignment problem (jShmovs and Tardos 



(| 19931 )) is known to reduce to the maximization of a submodular function over a partition matroid. 
An interesting stochastic generalization of this problem that would allow for a number of inter- 
esting applications would involve making the successful placement of an item in a bin stochastic. 
Unfortunately, this particular stochastic generalization does not reduce to a stochastic depletion 
problem but is nonetheless very similar to one. 

Another broad issue is identifying other families of stochastic depletion problems that satisfy 
the VFM and IR properties, or in another direction, identifying conditions under which we may 
not expect one of those properties to be satisfied. Yet another issue is the optimality of our 
approximation schemes: for deterministic variants of several of the application problems considered 
in this work such as the submodular maximization problems over matroids, there exist (typically, 
fairly complex) offline algorithms that admit an approximation ratio of (see 



Calinescu et al 



(|2007l )). This guarantee is known to be optimal. That is, no efficient approximation algorithm with 
a superior guarantee exists, unless P = NP. It would be interesting to understand whether an 
approximation ratio of 2 is optimal in some sense for either of the families of problems for which 
we have established that guarantee in this paper. In any case, given that the best approximation 
guarantee we may expect is a factor of — §y, it is remarkable that a simple myopic scheme comes so 
close to achieving that guarantee and that this guarantee may be established in the generality of 
the stochastic depletion framework. 
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A Miscellaneous Technical Proofs 

Lemma 3. (Immediate Rewards) We have for Linear Decaying Stochastic Depletion problems, for all s 6 S 
and a G x m {0, 1, . . . , x(s) m }, 

+ J* (&(*)) , 

m 

Proof: Consider using an optimal policy starting at state s. Let us partition the initial set of jobs into a 
set of 'fictitious' and 'real' jobs; we assume that we begin with a rn fictitious jobs of type to and x(s) m — a m 
real jobs of type m. This partitioning serves purely as a labeling of jobs and does not impact the system 
in any fashion. In particular, if at some point in time t, we have x{ m and x\ m fictitious and real jobs of 
type m respectively, then using activity set A results in the completion of x( m and Xf m fictitious and 
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real jobs respectively where X( m is a Bernoulli- (x( m , Pt, m { A)) random variable and X^ m is a Bernoulli- 
(Xf m , Pt tm (A)) random variable (so that x{ m + X r tm = Xt. m )- The revenues earned are ^2 m X( m wt. m and 
^2 m Xf m Wt. m and we are left with x{ m — X( m and x\ m — X\ m fictitious and real jobs respectively. 

Denote by jf'*(s) the expected reward-to-go under an optimal policy starting at state s earned from the 
completion of fictitious jobs. Likewise, we define J r '* (s) as the expected reward-to-go under an optimal policy 
starting at state s earned from the completion of real jobs. Now, by construction, Jf'*(s) + J r '*(s) — J*(s). 
Since at best our scheduling policy can exhaust all fictitious jobs and since w t , m is non-increasing in t for all 
m, J*'*(s) < a-mWt(s).m- Now, J"(s) may be viewed as the reward-to-go under some admissible policy 
7r starting at state S a (s)). Noting that x(S a (s)) m is precisely the initial number of 'real' jobs of type m, we 
then have: J r '*(s) = J w (S a (s))) < J*(S a (s))). Consequently, we have, 

J*(S) = Jf>*(s) + J r >*(s) <J2 a rnW t{ - s) , m + J*(S a (s))), 

m 

which is the result. □ 
Lemma 5. If Pf is constant (= C u ) for all u, then, J*" (so) — J*(so) for all so £ S. 

Proof: Note that since the processes P" are deterministic here, we may without loss restrict attention to 
policies that are functions of only time t, and (x 1 , x 2 , . . . , x 11 ). Let wf = r m for item m = (i,u,T m ,d m ). 
We define the set of myopic packets as Pq = argmax^ C u w^x u (so)j. Let us assume for the sake of 
contradiction that in state sq, no optimal policy transmits a packet in Pq. Let 7r* be an optimal policy; 

7T*(So)£P *- 

Define a policy 7f according to: 

tt(so) = vr 9 (s ) 
7f(s) = 7T*(/(s)) Vs ^ s . 

where / : S — > S is defined according to x* g( - s °\f(s)) — x^ 9 ^ ) (s ), x j (f(s)) = x J (s) Vj ^ 7r 9 (s ), and 

*(/(*)) =t(*)-l. 

Further, define a policy 7r according to: 

#(*)= n*(g(s)) Vs 

where g : S — > S is defined according to x(g(sj) = x(s),t(g(s)) = t(s) + 1. 

Let r = min{£ : w(st) 6 Pq} (set t to oo if the set is empty) and consider using policy ff for t < r and 
policy 7T thereafter. We call this policy tt'. Denote by R™ and R^* the random rewards earned in the tth 
time step under the n' and tt* policies respectively, so that J*(s ) = E[J2t } an d J* ( s o) = Rt ]■ 

Now observe that by our construction, E[R$'] > E[R$*], R? =Rf^ for < t < t and Rf =Rf for t > r. 
If r < oo, then it immediately follows that J*(so|r < oo) = J" (sq\t < oo); if r = oo then upon noting that 
E[Rq'} > i?[P^l 1 ], we have J*(s \t = oo) > J*' (sq\t = oo). Consequently, J* ' (sq) > J*(so), so that n' is 
an optimal policy as well. This contradicts our assumption that no optimal policy transmits a packet in Pg 
at t = 0. We may thus assume without loss that an optimal policy transmits a packet in Pg in the first time 
step. This suffices for the proof. □ 
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